Search CORE

9 research outputs found

Fast matching statistics in small space

Author: Belazzougui Djamal
Cunial Fabio
Denas Olgert
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th International Symposium on Experimental Algorithms (SEA 2018)
Publication date: 01/01/2018
Field of study

Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor\u27s memory, and they approach comparable running times when S and T are very similar

Dagstuhl Research Online Publication Server

MPG.PuRe

Fast algorithms for computing sequence distances by exhaustive substring composition

Author: A Apostolico
A Kolmogorov
A Lempel
Alberto Apostolico
B Blaidsell
B Hao
H Otu
I Ulitsky
J Na
J Qi
JV Helden
L Brillouin
LL Gatlin
M Höhl
M Li
Olgert Denas
P Ferragina
R Edgar
R von Mises
S Vinga
TJ Wu
TM Cover
Publication venue: BioMed Central
Publication date: 01/10/2008
Field of study

The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle

Author: A Marchler-Bauer
A Zemach
AA Pinto-Tomas
AD Blake
AE Little
AE Little
AL Price
Amy Cavanaugh
AV Lukashin
B Hölldobler
B Hölldobler
BL Cantarel
Brian R. Johnson
C Claudianos
Cameron R. Currie
Carson Holt
CD Smith
Chris R. Smith
Christine G. Elsik
Christopher D. Smith
CJ Mungall
CL Frost
Clotilde Teiling
CR Currie
CR Currie
CR Currie
CR Currie
CR Currie
CR Smith
D Wagner
D-C Oh
Dan Graur
Darren E. Hagen
DC De Graaf
E Abouheif
EF Kirkness
Ehab Abouheif
EI Boyle
Elizabeth Cash
EO Wilson
EO Wilson
Eran Elhaik
Eric J. Caldera
Erich Bornberg-Bauer
Fabian Zimmer
FC Pagnocca
G Fowler
G Meister
G Parra
G Slater
G Suen
Garret Suen
George M. Weinstock
Gregory Copenhaver
H Fernandez-Marin
Hao Hu
I Korf
J Chen
J Jurka
J Martins
James Taylor
Jarrod J. Scott
Jay Kim
JD Evans
JH Werren
Joseph A. Moeller
Joshua D. Gibson
JR Miller
Justin T. Reese
Jürgen Gadau
K Forstemann
Kirk J. Grubbs
L Li
Lewyn Li
LM Field
Lothar Wissler
Lumi Viljakainen
M Ashburner
M Bacci Jr
M Beye
M Kanehisa
M Margulies
M Poulsen
M Stanke
Marguerite C. Murphy
Marie-Julie Favé
Mark D. Yandell
Martin Helmkampf
MD Adams
MD Drapeau
Meredith C. Naughton
MH Haydak
MM Martin
MN Becker
Mónica C. Muñoz-Torres
N Elango
N Tsutsui
NA Weber
Neil D. Tsutsui
Nicole M. Gerardo
Olgert Denas
P Vilmos
Pascal Bouffard
Q Wu
R Bonasio
R Feyereisen
R Gil
R Illingworth
R Kucharski
R Tatusov
R Wirth
Rajendhran Rajakumar
RC Edgar
RH Crozier
Rick Overson
RR Do Nascimento
S Feng
S Haeder
S Hunter
S Karlin
S Martin
S Raghupathi Rami Reddy
S Tweedie
S van Dongen
S Wasserman
S Wei
Sandra W. Clifton
Sarah E. Marsh
SE Lewis
SF Altschul
Shu Tao
SRR Reddy
Steven C. Slater
Surabhi Nigam
SV de Azevedo
T Belt
T Hinton
T Wicker
Timothy T. Harkins
TR Schultz
UG Mueller
Wesley C. Warren
Y Wang
Z Bao
Publication venue: Public Library of Science
Publication date: 01/02/2011
Field of study

Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host–microbe symbioses

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

White Rose Research Online

Knowledge Specialization in Ph.D. Student Groups

Author: Conti Annamaria
Denas Olgert
Visentin Fabiana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

Researchers have argued that specialization within groups yields productivity gains. We evaluate this statement with a focus on groups of Ph.D. students. Using an established technique in computer science called Latent Dirichlet Allocation, we construct a novel measure of the dispersion of Ph.D. students' research interests based on their dissertation abstracts. We then relate this measure to Ph.D. group publications. For our study, we use a rich dataset on groups of Ph.D. students who studied at a major Swiss University, during the 1993-2008 period. We find robust evidence that within-group knowledge specialization is associated with a larger number of publications. However, when specialization increases beyond a critical level, it hinders the group's publication output. We interpret these results as an indication that gains, in the amount of research output, can be achieved if Ph.D. students specialize according to their comparative advantages. However, beyond a certain level, knowledge specialization has a detrimental impact on research output, due to increasing communication costs and an increased likelihood of conflict insurgence

Infoscience - École polytechnique fédérale de Lausanne

Maastricht University Research Portal

Knowledge Specialization in PhD Student Groups

Author: A
A Agrawal
A Arora
A Jaffe
Annamaria Conti
B A Jacob
B F Jones
B F Reskin
B Hamilton
B Holmstrom
C Forman
C Knight
D C Hambrick
D M Blei
D P Libaers
D Ramage
E Mansfield
E P Lazear
E T Stuen
F Waldinger
Fabiana Visentin
G Black
G Irie
G Mcmillan
G S Becker
H Etzkowitz
J A Groen
J D Adams
J Groen
J Pleffer
J Singh
K Y Williams
L Wu
M Calderini
M F Porter
M Gruber
M Steyvers
Olgert Denas
P Dasgupta
P E Stephan
P Gosling
R Blundell
R Blundell
R R Nelson
S Belenzon
S G Levin
S Gurmu
S Wuchty
T L Griffiths
T R Zenger
T Shinn
V Mangematin
W Ding
W Ling
Y Song
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Crossref

Recommended from our members

The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle.

Author: Abouheif Ehab
Bornberg-Bauer Erich
Bouffard Pascal
Caldera Eric J
Cash Elizabeth
Cavanaugh Amy
Clifton Sandra W
Currie Cameron R
Denas Olgert
Elhaik Eran
Elsik Christine G
Favé Marie-Julie
Gadau Jürgen
Gerardo Nicole M
Gibson Joshua D
Graur Dan
Grubbs Kirk J
Hagen Darren E
Harkins Timothy T
Helmkampf Martin
Holt Carson
Hu Hao
Johnson Brian R
Kim Jay
Li Lewyn
Marsh Sarah E
Moeller Joseph A
Murphy Marguerite C
Muñoz-Torres Mónica C
Naughton Meredith C
Nigam Surabhi
Overson Rick
Rajakumar Rajendhran
Reese Justin T
Scott Jarrod J
Slater Steven C
Smith Chris R
Smith Christopher D
Suen Garret
Tao Shu
Taylor James
Teiling Clotilde
Tsutsui Neil D
Viljakainen Lumi
Warren Wesley C
Weinstock George M
Wissler Lothar
Yandell Mark D
Zimmer Fabian
Publication venue: eScholarship, University of California
Publication date: 01/02/2011
Field of study

eScholarship - University of California

An encyclopedia of mouse DNA elements (Mouse ENCODE)

Author: A Sandra Stehling
Alex Dobin
Ali Mortazavi
Amartya Sanyal
Anthony Kiralusha
Audra Johnson
Barbara J Wold
Bing Ren
Brian A Williams
Bryan R Lajoie
Carrie A Davis
Chenghai Xue
Cheryl A Keller
Chris Zaleski
Christopher Morrissey
Daniel Blankenberg
David M Gilbert
David McCleary
Deepti Jain
Diane Trout
Elise A Feingold
Erica Giste
Feng Yue
Gaurav Jain
Gayathri Balasundaram
Georgi K Marinov
Gerd A Blobel
Gilberto Desalvo
Gregory E Crawford
Henry Amrhein
Huaien Wang
James Taylor
Jin Lian
Job Dekker
John A Stamatoyannopoulos
Julien Lagarde
Kate R Rosenbloom
Katherine I Fisher-Aylor
Katrina Learned
Kaun-Bei Chen
Lee Edsall
Leslie B Adams
Manoj Hariharan
Mark Groudine
Marta Byrska-Bishop
Max Pimkin
Meagan Fastuca
Melissa S Cline
Mia Zhang
Michael Bender
Michael Pazin
Michael Snyder
Mitchell J Weiss
Olgert Denas
Peter J Good
Peter J Sabo
Philip Cayting
Rachel Byron
Rajinder Kaul
Rebecca F Lowdon
Richard Sandstrom
Robert E Thurman
Roderic Guigo
Ross Hardison
Samantha Kuan
Sarah Djebali
Sherman M Weissman
Sonali Jha
Stephen G Landt
Swathi A Kumar
Takayo Sasaki
Tejaswini Mishra
Theresa Canfield
Thomas Gingeras
Tyrone Ryba
Vanessa M Kirkup
Vaughan Roach
Venkat S Malladi
W James Kent
Wei Lin
Weisheng Wu
Wulan Deng
Yin Shen
Yong Cheng
Zhen Ye
Zhihai Ma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome

Crossref

eScholarship@UMMS

UPF Digital Repository

Recommended from our members

A Comparative Encyclopedia of DNA Elements in the Mouse Genome

Author: Adams Leslie B.
Amrhein Henry
Antoshechkin Igor
Bansal Mukul S.
Bates Daniel
Beal Kathryn
Beer Michael A.
Bender M. A.
Blobel Gerd A.
Boyle Alan P.
Breschi Alessandra
Bussotti Giovanni
Byron Rachel
Canfield Theresa
Cayting Philip
Chang Kai-Hsin
Cheng Yong
Cline Melissa S.
Davis Carrie
De Bruijn Marella
de Sousa Beatriz Lacerda
Denas Olgert
DeSalvo Gilberto
Diegel Morgan
Disteche Christine
Djebali Sarah
Dobin Alex
Dogan Nergiz
Drenkow Jorg
Dunn Douglas
Edsall Lee
Eichler Evan E.
Erickson Drew T.
Euskirchen Ghia
Fastuca Meagan
Feingold Elise A.
Fisher-Aylor Katherine
Flicek Paul
Gilbert David M.
Gingeras Thomas R.
Giste Erika
Good Peter J.
Gosh Srikanta
Groudine Mark T.
Guigo Roderic
Hansen R. Scott
Hardison Ross C.
Harris Robert S.
Haugen Eric
Herrero Javier
Humbert Richard
Jain Deepti
Jansen Camden
Johnson Audra
Josefowicz Steven
Kahveci Tamer
Kaul Rajinder
Kawli Trupti
Keller Cheryl A.
Kellis Manolis
Kent W. James
Kirilusha Anthony
Kirkup Vanessa M
Kuan Samantha
Kundaje Anshul
Kutyavin Tanya
Lagarde Julien
Learned Katrina
Lee Dongwon
Lee Kristen
Levasseur Dana
Li Kanwei
Lian Jin
Lin Shin
Lin Yiing
Lowdon Rebecca F.
Ma Zhihai
Malladi Venkat S.
Marinov Georgi K.
McCleary David
Mishra Tejaswini
Morrissey Christapher S.
Mortazavi Ali
Neph Shane
Neri Fidencio
Notredame Cedric
Orkin Stuart H.
Papayannopoulou Thalia
Pazin Michael J.
Pervouchine Dmitri D.
Pham Long
Pignatelli Miguel
Pope Benjamin D.
Prieto Pablo
Rasmussen Matthew D.
Reh Thomas A.
Ren Bing
Reynolds Alex P.
Rosenbloom Kate R.
Rudensky Alexander
Ryba Tyrone
Rynes Eric
Sabo Peter J.
Samstein Robert
Sandstrom Richard
Santos Miguel Ramalho
See Lei-Hoon
Selleri Licia
Shafer Anthony
Shen Yin
Skoultchi Arthur
Sloan Cricket A.
Snyder Michael P.
Stamatoyannopoulos John
Tanzer Andrea
Taylor James
Thurman Bob
Treuting Piper
Trout Diane
Vierstra Jeff
Vong Shinny
Wang Yanli
Weiss Mitchell J.
Weissman Sherman M.
Wilken Matthew S.
Williams Brian A.
Wold Barbara
Wu Weisheng
Wu Yi-Chieh
Ye Zhen
Yue Feng
Zaleski Chris
Zhang Miaohua
Zhou Xiao-Qiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2015
Field of study

Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases

Harvard University - DASH